Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Datagen: Cache supported_locales impls #4470

Merged
merged 10 commits into from
Dec 19, 2023
Merged

Conversation

sffc
Copy link
Member

@sffc sffc commented Dec 19, 2023

cargo make bakeddata datetime performance on my machine:

On main: 173.91 seconds.

On this branch: 62.24 seconds.

Still slow but it seems to be substantially faster.

@sffc
Copy link
Member Author

sffc commented Dec 19, 2023

Based on the flamegraph, it seems after this lands the next bottlenecks might be:

  1. impl Display for TokenStream (this seems to be by far the biggest contributor now)
  2. rustfmt (seems to be almost half of the time spent??)
  3. impl Bake for DateSkeletonPatternsV1 (small but big enough to show up)

Copy link
Member

@robertbastian robertbastian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

observation: DatagenDriver already calls supported_locales exactly once per key, so this is a DatagenProvider-internal issue.

@@ -306,6 +309,28 @@ pub struct SourceData {
pub(crate) icuexport_dictionary_fallback: Option<Arc<SerdeCache>>,
#[cfg(feature = "legacy_api")]
pub(crate) collations: Vec<String>,
pub(crate) supported_locales_cache_vec:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use an Arc?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switched to Arc

.map_err(|e| *e)
}

fn supports_locale(&self, locale: &DataLocale) -> Result<bool, DataError> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

supports_locale is called n times per key, whereas supported_locales is only called once. It probably makes more sense to cache a HashSet and convert it to a Vec for the one call, than to linearly scan a Vec n times.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Performance on cargo make bakeddata datetime (run time only, not build time):

Commit Trial 1 Trial 2 Trial 3
11a1ed1 (Vec) 53.18 61.74 53.82
f2de286 (HashSet) 58.64 51.89 51.03

So it looks like HashSet wins.

@Manishearth
Copy link
Member

rustfmt (seems to be almost half of the time spent??)

n.b. we don't run this in google3 anyway

FrozenMapThrowawayClone<DataKey, Box<Result<Vec<DataLocale>, DataError>>>,
}

pub(crate) struct FrozenMapThrowawayClone<K, V>(pub(crate) FrozenMap<K, V>);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe elsa master has these impls now

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does for sync at least.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

elsa 1.10 should work.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Upgraded to elsa 1.10

Copy link
Member

@Manishearth Manishearth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

r=me with the change from vec to hashset

@sffc sffc requested a review from a team as a code owner December 19, 2023 17:55
@sffc sffc merged commit bd36f1e into unicode-org:main Dec 19, 2023
29 checks passed
@sffc sffc deleted the datagen-sl-memo branch December 19, 2023 21:05
@sffc
Copy link
Member Author

sffc commented Dec 21, 2023

Changelog in #4484

sffc added a commit that referenced this pull request Dec 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants